Centrality

SNA Week 4

Matt Pietryka

Key Terms

  • Centrality: node-level activity, importance, or prominence
  • Centralization: graph-level variation in individual centrality scores

Degree Centrality counts the number of nodes to which \(n_i\) is adjacent

\[ d_i = \sum^g_{j = 1} x_{ij} \]

  • Possible values depend on \(g\), the number of nodes in the network

Degree Centrality counts the number of nodes to which \(n_i\) is adjacent

  igraph::degree(er_example)  %>% summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     3.0     8.0    11.0    10.6    12.0    18.0
  igraph::degree(ws_example)  %>% summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       7       9      10      10      11      13
  igraph::degree(b_example)  %>% summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.00   11.00   12.00   20.68   17.00   90.00

Standardized Centrality Measures Transform Raw Scores to Comparable Values on a 0 to 1 scale

  1. Raw Measures reflect a given characteristic that is a function of the size or boundaries of the network
  2. Standardized Measures take on the general form of: \[\text{Std. Cent.} = \frac{\text{raw centrality}}{\text{max possible numerator | net. characteristics}} \\ \rightarrow 0 \leq S_x \leq 1\]

Standardized Degree Centrality counts the proportion of nodes to which \(n_i\) is adjacent

\[ d'_i = \frac{d_i}{g - 1} \]

  • \(g =\) number of nodes in network
  • Possible values independent of \(g\)

Centralization Measures graph-level variation in individual centrality scores

  • Centralization Measures take on the general form of: \[\text{Centralization} = \frac{\sum_{i=1}^g [\text{Centrality}_{max} - \text{Centrality}_i]}{\text{max possible numerator | net. characteristics}} \\ \rightarrow 0 \leq S_x \leq 1\]

Degree Centralization increases with variation in individual degree scores

\[ D = \frac{\sum_{i=1}^g [d_{max} - d_i]}{(g - 2)(g - 1)} \]

  • D = 1 when a single node is adjacent to all other nodes and no nodes form any other edges
  • D = 0 when all nodes have the same degree

igraph::centr_degree(er_example)$centralization
## [1] 0.07474747
igraph::centr_degree(ws_example)$centralization
## [1] 0.03030303
igraph::centr_degree(b_example)$centralization
## [1] 0.3536374

Density is the average Standardized Degree

  • measures group cohesion
  • fraction of possible edges present
  • does not measure centralization

Closeness Centrality measures a node’s proximity to all other nodes in the network

\[ c_i = [\sum_{j=1}^g d_{ij}]^{-1} \]

  • Closeness inversely related to shortest paths between pairs of nodes, \(d_{ij}\)
  • Depends on direct and indirect ties
  • Only meaningful for connected graphs
  • If \(d_{ij} = \infty\), can set to \(g\)

Closeness Centrality can be difficult to interpret

  • A large closeness score could mean:
    • Very close to a subset of the network
    • Moderately close to the whole network

Standardized Closeness Centrality measures the inverse average distance between actor i and all other actors

\[ c'_i = (g - 1)c_i \]

  • ranges from zero to one
  • \(c'_i = 1\) when \(node_i\) is adjacent to all other actors

#igraph::closeness(er_example, normalized = TRUE)  %>% summary()
igraph::closeness(ws_example, normalized = TRUE)  %>% summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3333  0.3623  0.3808  0.3826  0.3992  0.4605
igraph::closeness(b_example, normalized = TRUE, mode = c("in"))  %>% summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01000 0.01000 0.01020 0.03891 0.01285 0.91670
igraph::closeness(b_example, normalized = TRUE, mode = c("all"))  %>% summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5294  0.5294  0.5323  0.5668  0.5470  0.9167

Closeness Centralization

\[ C = \frac{\sum_{i=1}^g c'_{max} - c'_i}{(g - 2)(g - 1)/(2g - 3)} \]

  • as with Degree Centralization, the denominator equals the largest possible value for the numerator
  • again, only appropriate for connected graphs

igraph::centr_clo(er_example, normalized = TRUE)$centralization 
## [1] 0.1322996
igraph::centr_clo(ws_example, normalized = TRUE)$centralization 
## [1] 0.1581056
igraph::centr_clo(b_example, normalized = TRUE, mode = c("all"))$centralization 
## [1] 0.7104668

Betweenness Centrality of actor \(i\) is the proportion of all the ties between nodes \(j\) and \(k\) that are mediated by \(i\)

\[ B (n_i) = \sum_{i \neq j \neq k} \frac{g_{jk}(n_i)}{g_{jk}} \]

where:

  • \(g_{jk}\) is the total number of shortest paths from node \(j\) to node \(k\)
  • \(g_{jk}(n_i)\) is the number of those paths that pass through node \(i\)
  • can be computed even for disconnected graphs

Betweenness Centrality assumes that information, contact, or other communication between actors travels along the shortest paths with equal probability

Standardized Betweenness Centrality

\[ B' (n_i) = \frac{B (n_i) }{[(g - 1)(g - 2)/2]} \]

igraph::betweenness(er_example, normalized = TRUE, directed = FALSE)  %>% 
  summary()
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0006065 0.0066530 0.0106600 0.0121400 0.0148100 0.0381700
igraph::betweenness(ws_example, normalized = TRUE, directed = FALSE) %>% 
  summary()
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.001963 0.005955 0.012500 0.016590 0.023800 0.065370
igraph::betweenness(b_example, normalized = TRUE, directed = TRUE) %>% 
  summary()
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000000 0.0000000 0.0006321 0.0012750 0.0018810 0.0072570

Betweenness Centralization

\[ B = \frac{2\sum_{i=1}^g[b_{max} - b_i]}{[(g-1)^2(g-2)]} \]

Eigenvector Centrality weighs a node’s ties by the centrality of the nodes they are connected to

  • Other measures treat each tie equally regardless of the identity of the nodes in question
  • Eigenvector Centrality values a tie to a highly central actor over a tie to a less central actor

Eigenvector Centrality weighs a node’s ties by the centrality of the nodes they are connected to

\[ e_i = \frac{1}{\lambda}\sum_{j: j \neq i} x_{ij}e_j \]

where:

  • \(e\) is the first eigenvector of the sociomatrix
  • \(x_{ij}\) is the value of cell [i,j] in the sociomatrix
  • \(\frac{1}{\lambda}\) is the reciprocal of the greatest eigenvalue of the sociomatrix
  • \(e_j\) is node j’s centrality score
  • \(e_i\) already scaled to range from zero to one

igraph::evcent(er_example,  directed = FALSE)$vector  %>%   summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1409  0.4003  0.5189  0.5281  0.6252  1.0000
igraph::evcent(b_example,  directed = TRUE)$vector %>%   summary()
## Warning in .Call("R_igraph_eigenvector_centrality", graph, directed,
## scale, : At centrality.c:344 :graph is directed and acyclic; eigenvector
## centralities will be zeros
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0       0       0       0
igraph::evcent(b_example,  directed = FALSE)$vector %>%   summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1692  0.2430  0.2681  0.3406  0.3107  1.0000

Eigenvector Centralization

\[ E = \frac{\sum_{i = 1}^g(e_{max}-e_i)}{(1-e_i)} \]

igraph::centr_eigen(er_example)$centralization 
## [1] 0.4815106
igraph::centr_eigen(ws_example)$centralization 
## [1] 0.309248
igraph::centr_eigen(b_example, directed = FALSE)$centralization 
## [1] 0.6728771

THE EFFECT OF SAMPLING ON CENTRALITY SCORES

  1. Draw random network of size g = 1000
  2. Calculate node- and network-level statistics
  3. Create three samples by randomly sampling .5, .75, and .9 of the edges
  4. Compare sample estimates to true whole-network values
  5. Repeat \(\times 100\)

Sampling Effects on Node-Level Measures: 1. Erdos-Renyi Graphs

  • Edges
    • Robust to sampling: Betweenness, Eigencentrality
  • Nodes
    • Robust to sampling: Closeness, Eigencentrality

Sampling Effects on Node-Level Measures: 2. Watts-Strogatz Graphs

  • Edges
    • Robust to sampling: Betweenness
  • Nodes
    • Robust to sampling:

Sampling Effects on Node-Level Measures: 3. Barabasi Graphs

  • Edges
    • Robust to sampling: Eigencentrality
  • Nodes
    • Robust to sampling: Closeness, Eigencentrality

Sampling Effects on Network-Level Measures: 1. Erdos-Renyi Graphs

  • Edges
    • Overestimate: Betweenness, Eigencentralization, Mean Distance, Closeness
    • Underestimate: Degree, Density, Closeness
  • Nodes
    • Overestimate: Betweenness, Closeness, Degree, Eigencentralization
    • Underestimate:

Sampling Effects on Network-Level Measures: 2. Watts-Strogatz Graphs

  • Edges
    • Overestimate: Betweenness, Closeness, Degree, Eigencentralization, Mean Distance
    • Underestimate: Density
  • Nodes
    • Overestimate: Betweenness, Closeness, Degree, Eigencentralization, Mean Distance
    • Underestimate:

Sampling Effects on Network-Level Measures: 3. Barabasi Graphs

  • Edges
    • Overestimate: Betweenness, Eigencentralization, Mean Distance
    • Underestimate: Closeness, Density, Degree
  • Nodes
    • Overestimate:
    • Underestimate:

Assignment 2

  1. Find whole-network data on two networks of similar type
    • May be same actors and types of relationships at different time points
    • May be different actors with same type of relationships
    • May be same actors with different type of relationships
  2. Briefly describe the data: what do the nodes and edges represent?
    • Provide a sociogram of each network
  3. Explain which measures of centrality/centralization seem most appropriate for these networks
  4. Use these measures to describe the networks and compare/contrast them
  5. Discuss how strongly do other measures of centrality correlate with those you chose to focus on
    • Include tables or graphs to help summarize the data
  6. Remove at random 30% of the nodes from the network. Describe the extent to which your selected measures of centrality and centralization change.
    • you need only discuss the measures you deemed most appropriate for this type of network)